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We present several novel identities and inequalities relating the mutual information and the directed 
information in systems with feedback. The internal blocks within such systems are restricted only to be 
causal mappings, but are allowed to be non-linear, stochastic and time varying. Moreover, the involved 
signals can be arbitrarily distributed. We bound the directed information between signals inside the 
feedback loop by the mutual information between signals inside and outside the feedback loop. This 
fundamental result has an interesting interpretation as a law of conservation of information flow. Building 
' upon it, we derive several novel identities and inequalities, which allow us to prove some existing 

\Q , information inequalities under less restrictive assumptions. Finally, we establish new relationships between 

nested directed informations inside a feedback loop. This yields a new and general data-processing 
Cn ' inequality for systems with feedback. 



^ ■ I. Introduction 

The notion of directed information introduced by Massey in 0]] assesses the amount of information that 
causally "flows" from a given random and ordered sequence to another. For this reason, it has increasingly 
found use in diverse applications, from characterizing the capacity of channels with feedback Q-1IL the 
rate distortion function under causality constraints ||5], establishing some of the fundamental limitations in 
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networked control ll6l- |[TTI . determining causal relationships in neural networks |[T2l . to portfolio theory 
and hypothesis testing lPl3l . to name a few. 

The directed information from a randomQ sequence x k to a random sequence y k is defined as 

k 

1=1 

where the notation x* represents the sequence x(l), x(2), . . . , x(i). The causality inherent in this definition 
becomes evident when comparing it with the mutual information between x fc and y k , given by I(x fc , y k ) = 
^^ =1 1(y(z); x fc |y J_1 ). In the latter sum, what matters is the amount of information about the entire 
sequence x fc present in y(i), given the past values y l ~ l . By contrast, in the conditional mutual informations 
in the sum of (Q]), only the past and current values of x k are considered, that is, x\ Thus, I(x k — >■ y k ) 
represents the amount of information causally conveyed from x fc to y k . 

There exist several results characterizing the relationship between I(x fc — > y k ) and I(x k ;y k ). First, 
it is well known that I(x k — > y k ) < I(x k ;y k ), with equality if and only if y k is causally related to 
x fc 12. A conservation law of mutual and directed information has been found in |[T4l . which asserts that 
I(x k — > y k ) + /(0*y fc_1 — > x fc ) = I(x k ;y k ), where 0*y fc_1 denotes the concatenation 0,y(l), . . . ,y k ~ 1 . 

Given its prominence in settings involving feedback, it is perhaps in these scenarios where the directed 
information becomes most important. For instance, the directed information has been instrumental in 
characterizing the capacity of channels with feedback (see, e.g., 0, El, lfl31 and the references therein), 
as well as the rate-distortion function in setups involving feedback 0, ||9l- lfTTIl . |[T6l . 

In this paper, our focus is on the relationships (inequalities and identities) involving directed and mutual 
informations within feedback systems, as well as between directed informations involving different signals 
within the corresponding feedback loop. In order to discuss some of the existing results related to this 
problem, it is convenient to consider the general feedback system shown in Fig. [D-(a). In this diagram, the 
blocks Si, . . . , 1S4 represent possibly non-linear and time-varying causal systems such that the total delay 
of the loop is at least one sample. In the same figure, r, p, s, q are exogenous random signals (scalars, 
vectors or sequences), which could represent, for example, any combination of disturbances, noises, 
random initial states or side informations. We note that any of these exogenous signals, in combination 
with its corresponding deterministic mapping Si, can also yield any desired stochastic causal mapping. 

For the simple case in which all the systems {Si}f =1 are linear time invariant (LTI) and stable, and 
assuming p,x, q = (deterministically), it was shown in [13 that I(r k — > e k ) does not depend on 

1 Hereafter we use non-italic letters (such as x) for random variables, denoting a particular realization by the corresponding 
italic character, x. 
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whether there is feedback from e to u or not. Inequalities between mutual and directed informations in 
a less restricted setup, shown in Fig. [B(b), have been found in Q, (H. In that setting (a networked- 
control system), G is a strictly causal LTI dynamic system having (vector) state sequence {x(i)}°^ , with 
xo = x(0) being the random initial state in its state-space representation. The external signal r (which 
could correspond to a disturbance) is statistically independent of s, the latter corresponding to, for example, 
side information or channel noise. Both are also statistically independent of xq. The blocks labeled E, 



r p x 




q s s 

(a) (b) 



Figure 1. (a): The general system considered in this work, (b): A special case, corresponding to the closed-loop system studied 
in Hi 



D and / correspond to an encoder, a decoder and a channel, respectively, all of which are causal. The 
channel / maps s k and x fc to y(k) in a possibly time-varying manner, i.e., y(k) = f(k,x. k ,s k ). Similarly, 
the concatenation of the encoder, the channel and the decoder, maps s k and w fc to u(fc) as a possibly 
time-dependent function u(k) = ip(k, w k , s k ). Under these assumptions, the following fundamental result 
was shown in (H Lemma 5.1]: 

/(x ,r fc ;u fc )-^ fc ;u fc )>/(x ;e fc ). (2) 

By further assuming in (8 ] that the decoder D in Fig. [B(b) is deterministic, the following Markov chain 
naturally holds, 

(x ,r fc ) ^y k ^u k , (3) 

leading directly to 

/( X o,r fc ;y fc )-/(r fe ;u fe )>/(x ;e fe ), (4) 

which is found in the proof of |8, Corollary 5.3]. The deterministic nature of the decoder D played a 
crucial role in the proof of this result, since otherwise the Markov chain © does not hold, in general, 
due to the feedback from u to y. 
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Notice that both © and © provide lower bounds to the difference between two mutual informations, 
each of them relating a signal external to the loop (such as xo,r fc ) to a signal internal to the loop (such 
as u fc or y k ). Instead, the inequality 

/(x fc ^y fc )>/(r fc ;y fc ), (5) 

which holds for the system in Fig. [U-(a) and appears in JT] Theorem 3] (and rediscovered later in ||6] 
Lemma 4.8.1]), involves the directed information between two internal signals and the mutual information 
between the second of these and an external sequence. A related bound, similar to © but involving 
information rates and with the leftmost mutual information replaced by the directed information from x fc 
to y k (which are two signals internal to the loop), has been obtained in [JJ Lemma 4.1]: 

J(x -> y) - J(r; u) > lira l^Bl^l , (6) 

k— >oo ft 

with 7(x -> y) = linifc^oo ±I(x fe -> y k ) and J(r; u) = limfe^ \l{? k ; u k ), provided supj> E [x(i) T x(i)] < 
oo. This result relies on three assumptions: a) that the channel / is memory-less and satisfies a "conditional 
invertibility" property, b) a finite-memory condition, and c) a fading-memory condition, these two related 
to the decoder D (see Fig. []]). It is worth noting that, as defined in (TJ, these assumptions upon D exclude 
the use of side information by the decoder and/or the possibility of D being affected by random noise 
or having a random internal state which is non-observable (please see Q for a detailed description of 
these assumptions). 

The inequality © has recently been extended in |@1 Theorem 1], for the case of discrete- valued random 
variables and assuming s X (r, p, q), as the following identity (written in terms of the signals and setup 
shown in Fig. [B(a)): 

j( x * y k ) = I{ v \y k ) + /(x fc -> y fc | p k ). (7) 

Letting q = s in Fig. [B(a) and with the additional assumption that (p, s) X q, it was also shown in (4l 
Theorem 1] that 

I(x k -+ y k ) = I(p k ; y k ) + /(q*" 1 ; y k ) + I(p k ; q^ 1 | y fc ), (8) 

for the cases in which u(i) = y(i) + q(z) (i.e., when the concatenation of £4 and S\ corresponds to a 
summing node). In H, © and ([8]) play important roles in characterizing the capacity of channels with 
noisy feedback. 

To the best of our knowledge, ©, ©, © ©, © and © are the only results available in the literature 
which lower bound the difference between an internal-to-internal directed information and an external-to- 
internal mutual information. There exist even fewer published results in relation to inequalities between 
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two directed informations involving only signals internal to the loop. To the best of our knowledge, the 
only inequality of this type in the literature is the one found in the proof of Theorem 4.1 of 0. The latter 
takes the form of a (quasi) data-processing inequality for directed informations in closed-loop systems, 
and states that 

I(x fc -> y k || q k ) > I(x k -»■ u fc ), (9) 

provided^ q X (r, p) and if 54 is such that y l is a function of (u l , q*) (i.e., if 54 is conditionally invertible) 
VI In ©, 

k 

/(x fc -> y k || q k ) 4 ^/(y(i);x l ly^q') (10) 
i=i 

corresponds to the causally conditioned directed information defined in Q. Inequality (O plays a crucial 
role J9j, since it allowed lower bounding the average data rate across a digital error-free channel by 
a directed information. (In 0, q corresponded to a random dither signal in an entropy-coded dithered 
quantizer.) 

In this paper, we derive a set of information identities and inequalities involving pairs of sequences 
(internal or external to the loop) in feedback systems. The first of these is an identity which, under an 
independence condition, can be interpreted as a law of conservation of information flows. The latter 
identity is the starting point for most of the results which follow it. Among other things, we extend (0]) 
and © to the general setup depicted in Fig. [T]-(a), where none of the assumptions made in /[7|/-/l£l/ (except 
causality) needs to hold. Moreover, we will prove the validity of (J9j without assuming the conditional 
invertibility of 1S4 nor that q X (r,p). The latter result is one of four novel data-processing inequalities 
derived in Section IIII-B L each involving two nested directed informations valid for the system depicted 
in Fig. U}(a). The last of these is a complete closed-loop counterpart of the traditional open-loop data- 
processing inequality. 

The remainder of this paper begins with a description of the systems under study and the extension 
of Massey's directed information to the case in which each of the blocks in the loop may introduce 
an arbitrary, non-negative delay (i.e., we do not allow for anticipation). The information identities and 
inequalities are presented in Section [HI] For clarity of the exposition, all the proofs are deferred to 
Section [TV] A brief discussion of potential applications of our results is presented in Section [V] which 
is followed by the conclusions in Section IVT1 

2 Here, and in the sequel, we use the notation xly to mean "x is independent of y". 
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II. Preliminaries 

A. System Description 

We begin by providing a formal description of the systems labeled S± . . . Sdin Fig. U}(a). Their input- 
output relationships are given by the possibly-varying deterministic mappings^ 

e(i) =5i(u i - dl W,r i ), (11a) 

x(i)=S 2 (e i - d ^, V l ), (lib) 

y( i )=S 3 (x l - d ^,s l ), (11c) 

u(^=5 4 (y 4 -^«,q 4 ), (lid) 

where r, p,s,q are exogenous random signals and the (possibly time-varying) delays g?i , cfo , g?3 , g?4 € 
{0, 1, . . .} are such that 

di{k) + d 2 (k) + d 3 (k) + d 4 (k) > 1, V/c e N. 

That is, the concatenation of <Si, . . . , S4 has a delay of at least one sample. For every i 6 {1, . . . , k}, v(i) € 
l" r W, i.e., r(i) is a real random vector whose dimension is given by some function n r : {1, . . . , k} — > N. 
The other sequences (q, p, s, x, y, u) are defined likewise. 

B. A Necessary Modification of the Definition of Directed Information 

As stated in (T), the directed information (as defined in dD) is a more meaningful measure of 
the flow of information between x fc and y k than the conventional mutual information J(x fc ;y fc ) = 
X)i=i ^(y(*)i xfc |y* _1 ) when there exists causal feedback from y to x. In particular, if x k and y k are 
discrete-valued sequences, input and output, respectively, of a forward channel, and if there exists strictly 
causal, perfect feedback, so that x(z) = y(i — 1) (a scenario utilized in 0]] as part of an argument in 
favor of the directed information), then the mutual information becomes 

I(x k ; y k ) = H(y k ) - H(y k | x fc ) = H (y k ) - H(y k | y^ 1 ) = H(y k ) - H(j(k)\ y^ 1 ) = ^(y^ 1 ). 

Thus, when strictly causal feedback is present, I(x fc ; y k ) fails to account for how much information about 
x fc has been conveyed to y k through the forward channel that lies between them. 

It is important to note that, in [1] (as well as in many works concerned with communications), the 
forward channel is instantaneous, i.e., it has no delay. Therefore, if a feedback channel is utilized, then 

3 For notational simplicity, we omit writing their time dependency explicitly. 
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this feedback channel must have a delay of at least one sample, as in the example above. However, when 
studying the system in Fig. [B(a), we may need to evaluate the directed information between signals x k 
and y k which are, respectively, input and output of a strictly casual forward channel (i.e., with a delay 
of at least one sample), whose output is instantaneously fed back to its input. In such case, if one further 
assumes perfect feedback and sets x(i) = y(z), then, in the same spirit as before, 

k k 

I(x k -> y fc ) = ]T/(y«;x* ly*" 1 ) = £ [H(y(i)\ f" 1 ) - H(y({)\ xSy*" 1 )] = #(y fc ). 

i=l i=l 
As one can see, Massey's definition of directed information ceases to be meaningful if instantaneous 

feedback is utilized. 

It is natural to solve this problem by recalling that, in the latter example, the forward channel had 
a delay, say d, greater than one sample. Therefore, if we are interested in measuring how much of the 
information in y(k), not present in y 1-1 , was conveyed from x* through the forward channel, we should 
look at the mutual information I(y(i); x l ~ d | y* _1 ), because only the input samples x % ~ d can have an 
influence on y(i). For this reason, we introduce the following, modified notion of directed information 

Definition 1 (Directed Information with Forward Delay): In this paper, the directed information from 

x k to y k through a forward channel with a non-negative time varying delay of d(i) samples is defined as 

k 

/(x * y*) A £/(y(i) ;X i-«K0 |y*-l). (12) 

1=1 

For a zero-delay forward channel, the latter definition coincides with Massey's. 

Likewise, we adapt the definition of causally-conditioned directed information to the definition 

k 

I(x k -> y k || e fc ) 4 Y / I(Y(i);^- Ml) \f-\e l - d ^). 
i=l 

when the signals e, x and y are related according to (TTTt . 

Before finishing this section, it is convenient to recall the following identity (a particular case of the 
chain rule of conditional mutual information |[T8l ), which will be extensively utilized in the proofs of 
our results: 

/(a, b; c | d) = /(b; c | d) + /(a; c | b, d). (13) 

III. Information Identities and Inequalities 

A. Relationships Between Mutual and Directed Informations 

We begin by stating a fundamental result, which relates the directed information between two signals 
within a feedback loop, say x and y, to the mutual information between an external set of signals and y: 



January 29, 2013 



DRAFT 



8 



Theorem 1: In the system shown in Fig. \l}(a), it holds that 



J(x* 



/(q fc ,r fc ,p fe 



/(q fc ,r fc ,p 



x fc )</(p fc ,qV;y fc ), Vfe 6 N, (14) 



w/f/i equality achieved if s is independent of (p, q, r). A 
This fundamental result, which for the cases in which s X (p,q, r) can be understood as a law of 
conservation of information flow, is illustrated in Fig. [2] For such cases, the information causally conveyed 
from x to y equals the information flow from (q, r, p) to y. When (p, q, r) are not independent of s, part 
of the mutual information between (p,q, r) and y (corresponding to the term J(q fc ,r fc ,p fc — >• y k || x fc )) 
can be thought of as being "leaked" through s, thus bypassing the forward link from x to y. This provides 
an intuitive interpretation for ([14V 



e [Ai t, 



Hp 



Figure 2. The flow of information between exogenous signals (p, q, r) and the internal signal y equals the directed information 
from x fc to y fe when s X (p, q, r). 



Remark 1: Theorem Q] implies that 7(x fc — > y k ) is only a part of (or at most equal to) the information 
"flow" between all the exogenous signals entering the loop outside the link x — > y (namely (q, r, p)), and 
y. In particular, if (p,q, r) were deterministic, then I(x fc — > y k ) = 0, regardless of the blocks S\, . . . ,£4 
and irrespective of the nature of s. ▲ 

Remark 2: By using £[3]>, I(p k , q k , i k ; y fc ) = I(r k ; y k ) + I(p k , q k ; y k \ r k ). Then, applying Theorem [T] 
we recover d5), whenever s X (q, r,p). Thus, CD Theorem 3] and ||6l Lemma 4.8.1]) can be obtained as 
a corollary of Theorem Q] ▲ 

The following result provides an inequality relating I(x fc — > y k ) with the separate flows of information 
I(r k ;y k ) and I(p k ,q k ; y k ). 

Theorem 2: For the system shown in Fig. [TJ-faj, if s X (p, q, r) and r k X (p fc ,q fc ), then 

/(x fc -> y k ) > I{v k ;y k ) + I(p k ,q k ; y k ). (15) 
with equality if and only if the Markov chain (p fc ,q fc ) -f-> y fc O r fc holds. 
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Theorem |2] shows that, provided (p, q, r) X s, I(x fc — > y k ) is lower bounded by the sum of the individual 
flows from all the subsets in any given partition of (p fc , q k , r k ), to y k , provided these subsets are mutually 
independent. Indeed, both theorems Q] and |2] can be generalized for any appropriate choice of external 
and internal signals. More precisely, let G be the set of all external signals in a feedback system. Let 
a and (3 be two internal signals in the loop. Define @ a ,/3 C 6 as the set of exogenous signals which 
are introduced to the loop at every subsystem Si that lies in the path going from a to /3. Thus, for any 
p E G \ Qq,,/?, if @ a ,/3 J- G \ Q a ,p, we have that ([141 and (031 ) become 

I(a^P)=I(@\{@ a>p };!3), (16) 
I(a -+ p) - I(p; (3) > /(Q \ {p U Q^}; /?), (17) 

respectively. 

To finish this section, we present a stronger, non-asymptotic version of inequality ©: 
Theorem 3: In the system shown in Fig. \J^(a), if (r,p,q, s) are mutually independent, then 

I(x k -+ y k ) = I(r k ;u k ) + I(p k ;e k ) + I(q k ;y k ) + I(p k ; u k \ e k ) + I(r k ,p k ;y k \ u k ). (18) 

A 

As anticipated, Theorem [3] can be seen as an extension of © to the more general setup shown in Fig. [TJ- 
(a), where the assumptions made in (7] Lemma 4.1] do not need to hold. In particular, letting the decoder 
D and xo in Fig. [B(b) correspond to £4 and p fc in Fig. [H-(a), respectively, we see that inequality (fT5T ) 
holds even if D and E have dependent initial states, or if the internal state of D is not observable |[T9l . 

Theorem [3] also admits an interpretation in terms of information flows. This can be appreciated in the 
diagram shown in Fig. |3j which depicts the individual full-turn flows (around the entire feedback loop) 
stemming from q, r and p. Theorem [3] states that the sum of these individual flows is a lower bound for 
the directed information from x to y, provided q, r, p, s are independent. 



r p 




q S 



Figure 3. A representation of the three first information flows on the right-hand-side of ( 1181 , 



January 29, 2013 



DRAFT 



10 



B. Relationships Between Nested Directed Informations 

This section presents three closed-loop versions of the data processing inequality relating two directed 
informations, both between pairs of signals internal to the loop. As already mentioned in Section U 
to the best of our knowledge, the first inequality of this type to appear in the literature is the one in 
Theorem 4.1 in (see ©). Recall that the latter result stated that I(x k ->• y k \\ q k ) > I(x k -> u k ), 
requiring £4 to be such that y* is a deterministic function of (u*,q J ) and that q X (r, p). The following 
result presents another inequality which also relates two nested directed informations, namely, I(x k — > y k ) 
and I(e k — > y k ), but requiring only that s X (q, r,p). 

Theorem 4: For the closed-loop system in Fig. U^(b), if (q, r, p) X s, then 

/( X *_>y*)>/( e *_>y*). (19) 

A 

Notice that Theorem [4] does not require p to be independent of r or q. This may seem counter-intuitive 
upon noting that p enters the loop between the link from e to x. 

The following theorem is an identity between two directed informations involving only internal signals. 
It can also be seen as a complement to Theorem [4] since it can be directly applied to establish the 
relationship between I(e k — > y k ) and I(e k — > u k ). 

Theorem 5: For the system shown in Fig. \l}(a), if (q, s) X (r,p), then 

I(x k ->■ y k ) < I(x k -> u fc ) + I(q k ; y k ) + /(r fc , p fc ; y k \ u k ) + I(q k ; v k \ u k ,y k ). (20) 
with equality if in addition, q!s./« the latter case, it holds that 

I(x k y k ) = I(x k -> u fc ) + I(q k ; y k ) + I(r k ,p k ;y k \ u k ). (21) 

▲ 

Notice that, by requiring additional independence conditions upon the exogenous signals (specifically, 
q X s), Theorem |5] (and, in particular, (|2T1 ) yields 

I{x k -^y k ) > /(x fc ^u fc ), (22) 

which strengthens the inequality in J91 Theorem 4.1] (stated above in (©). More precisely, (l22l does not 
require conditioning one of the directed informations and holds irrespective of the invertibility of the 
mappings in the loop. 

A closer counterpart of © (i.e., of [jSQ Theorem 4.1]), involving I(x k — > y k \\ q k ), is presented next. 
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Theorem 6: For the system shown in Fig. \J}-(a), if (q, s) X ( r >p)» then 

I{x k -> y fe | q fc ) = J(x fe -»• u fc ) + I(r k ,p k ;y k \ u k ) + I(q k ; r fc | u fc , y fc ) = /(x fc -»• y fe || q fc ). (23) 

where the equality labeled (f) hods if, in addition, the Markov chain 

q* + i « — ► q* < — > s i (24) 

is satisfied for all % G {1, . . . , A;}. ▲ 
Thus, provided (q, s) X (r, p), (l23l yields that © holds regardless of the invertibility of 1S4, requiring 
instead that, for all i G {1, . . . , k}, any statistical dependence between q k and s* resides only in q* (i.e., 
that Markov chain (|24jl holds). 

The results derived so far relate directed informations having either the same "starting" sequence or the 
same "destination" sequence. We finish this section with the following corollary, which follows directly 
by combining theorems |4] and [5] and relates directed informations involving four different sequences 
internal to the loop. 

Corollary 1 (Full Closed-Loop Directed Data Processing Inequality): For the system shown in Fig. [7} 
(a), if (q, s) X (r, p) and q X s, then 

I(x k -> y k ) > I(e k -> u fc ) + I(q k ; y k ) + I(r k ;y k \ u k ) > I(e k -)• u k ). (25) 

Equality holds in (a) if, in addition, r X p (i.e., if (q, r, p,s) are mutually independent). A 
To the best of our knowledge, Corollary [T] is the first result available in the literature providing a lower 
bound to the gap between two nested directed informations, involving four different signals inside the 
feedback loop. This result can be seen as the first full extension of the open-loop (traditional) data- 
processing inequality, to arbitrary closed-loop scenarios. (Notice that there is no need to consider systems 
with more than four mappings, since all external signals entering the loop between a given pair of internal 
signals can be regarded as exogenous inputs to a single equivalent deterministic mapping.) 

IV. Proofs 

We start with the proof of Theorem Q] 

Proof of Theorem [7} It is clear from Fig. [B(a) and from (fill that the relationship between r, p, q, 
s, x and y can be represented by the diagram shown in Fig. |4] From this diagram and Lemma Q] (in the 
appendix) it follows that if s is independent of (r,p, q), then the following Markov chain holds: 

y(i) <— ► (x*-*® yi-i) <_> ( p *,qV). (26) 
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-l+d 1 +d 2 +d 4 



Figure 4. Representation of the system of Fig. Q}(b) highlighting the dependency between p, q, r, s, x and y. The dependency 
on i of the delays di(i), . . . , (£4(1) is omitted for clarity. 



Denoting the triad of exogenous signals p fc , q k , r fc by 

/ife A I k „k „k\ 



(p*,qV fc ), 



we have the following 



(27) 



/(x fc ^y J ) = ^/(y(i);x i -*«|y i - 1 ; 



03 



i=i 

k 



/(^;y(i)|x 



i— 0(3(1) „i— 1 



(a) 



(6) 



i=l 
k 



k 



;y » y 



i=l 



/(c? fe ;y fc ). 



(28a) 



(28b) 



(28c) 



In the above, (a) follows from the fact that, if y* 1 is known, then deterministic function of 

0*. The resulting sums on the right-hand side of (12 8 al l correspond to I(q fe , r fc , p fe — > y fc ) — I(q k ,r k ,p k — > 
y fc || x fc ), and thereby proving the first part of the theorem, i.e., the equality in (fl4l ). In turn, (6) stems from 
the non-negativity of mutual informations, turning into equality if s X (r, p, q), as a direct consequence 
of the Markov chain in d26l ). Finally, equality holds in (c) if s X (q, r, p), since y depends causally upon 
0. This shows that equality in (fl4l ) is achieved if s X (q, r, p), completing the proof. ■ 
Proof of Theorem |2]- Apply the chain-rule identity (TT3T ) to the RHS of (fl4l) to obtain 

I(6 k ; y k ) = I(p k , q k , r k ; y k ) = I(p k , q fc ; y fc | r k ) + I(r fc ; y fc ). (29) 
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Now, applying ([T3l twice, one can express the term 7(p fc ,q k ;y k | r fc ) as follows: 

I(p fc , q k ;y k | r fc ) = I(p k , q k ; y fc , r fc ) - 7(p fc , q fc ; r fc ) = 7(p fe , q k ; y fc , r fc ) 

(30) 

= /(p fe ,q fc ;y fc ) + /(p fc ,q fc ;r fc |y fc ), 
where the second equality follows since (p fc , q k ) X r fc . The result then follows directly by combining (l30l 
with d29]) and (HU). ■ 
Proof of Theorem |H - Since q X (r, p, s), 

/( x * _> y*) & 7(x fe _> u *) + 7(q fc ;y fc ) + /(r fc ,p fc ;y fc | u fc ) (31) 

( = } /(r fc , p fc ; u fc ) + /(q fc ; y fc ) + I(r fc , p fc ; y fc | u fc ) (32) 

( = } I(j k ; u k ) + I(p k ;u k | r k ) + I(q k ; y k ) + /(r fc , p fc ; y fc | u fc ), (33) 

where (a) is due to Theorem |5J (6) follows from Theorem [T] and the fact that (s, q) X (r, p) and (c) 
from the chain rule of mutual information. For the second term on the RHS of the last equation, we have 

I(p k ; u k | r k ) = I{p k ;u k | r fc ) + 7(p fc ; r fc ) = 7(p fc ; r fe , u fe ) (34) 

= /(p fe ; r fe , u fc , e fc ) - I(p fc ; e k | r fc , u fc ) (35) 

^I(p k ;v k ,u k ,e k ) (36) 

®I(p*;e*)+I(p fc ;r* u*|e*) (37) 

= I(p fc ; e fc ) + /(p fc ; u k \ e k ) + I(p k ;r k \ u k ,e k ) (38) 

( ^/(p fc ;e fc )+/(p fc ;u fc |e fc ), (39) 

where (a) holds since r X p, (b), (d) and (e) stem from the chain rule of mutual information ([TBI , 
and (c) is a consequence of the Markov chain e k <-> (u fe ,r fc ) p fc which is due to the fact that 
e k = Si(u k ~ dl ( k \r k ). Finally, (/) is due to the Markov chain r fc <-> (u k ,e k ) <-> p fc , which holds because 
r X (p, s, q) as a consequence of Lemma Q] in the appendix (see also Fig. [EKa)). Substitution of (|39l 
into (l33l yields (TT8T ). thereby completing the proof. ■ 
Proof of Theorem |?} Since (p, q, r) X s, we can apply ([5]) (where now (q, r) plays the role of r), 
and obtain 

/(x fc ^y fc )>7(q fc ,r fc ;y fc ). (40) 
Now, we apply Theorem [T] which gives 

I(q k ,r k ;y k )>I(e k ^y k ), (41) 



January 29, 2013 



DRAFT 



14 



completing the proof. 

Proof of Theorem^ Applying Theorem [T] since (r, p) X (s, q), 



I(x k -^u k ) =/(r fe ,p fe ; u fe ). (42) 

For the other directed information, we have that 

/(x fc ^y fc ) ( </(r fc , P fc ,q fc ;y fc ) 

®I(q fe ;y fc ) + I(r fe ,p fe ;y fe |q*) (43) 

® /(q fc ; y fc ) + I(r k , p k ; u fc , y k \ q k ) - I(r k , p fe ; u fc | q k ,y k ) 

= I(q k ; y k ) + I(r fc , p fc ; u fc , y k | q fc ) 

® /(q fc ; y k ) + J(r fc , p fc ; u fc , y fc , q fc ) - J(r fc , p fc ; q fc ) 

® /(q fc ; y fc ) + J(r fc , p fc ; u fc ) + /(r fe , p fc ; y fc , q fc | u fc ) - /(r fc , p fc ; q fc ) 

(c) 

< I(q k ; y fc ) + I(i k , p fc ; u fe ) + I(r k , p k ;y k , q k \ u k ) 

® /(q fc ; y fe ) + I(r k , p fc ; u fe ) + /(r fc , p fc ; y fc | u fc ) + J(q fc ;r* | u k , y k ) (44) 

(d) 

< I{q k ; y k ) + /(r fc , p fc ; u k ) + I(r k , p k ;y k \ u k ), (45) 

where (a) follows from Theorem Q] which also states that equality is reached if and only if (r, p, q) X s. 
In turn, (b) is due to the fact that u fc is a deterministic function of q k ,y k . Equality (c) holds if and only if 
(r, p) X q. Finally, from Lemma [T] (in the appendix), (d) turns into equality if q X (r, p,s). Substitution 
of (@2]> into (05]) yields (EB, completing the proof. ■ 
Proof of Theorem [6[ We begin with the second part of the theorem, proving the validity of the 
equality (|) in (l23l) . We have the following: 

k 

I(x fc y k || q k ) = ^I(y(i);x i - d ^\y i - 1 ,q i ) (46) 



i=l 
k 



(47) 



^[/(r\p\x i -*«;y(i)|y i - 1 ,q i )-/(r i ,p i ;y(i)|x i - d3 « ) y i - 1 ) q 

i=l 
(a) ^ 

< ^ 7(r% ^ , x^^^) ; y (z) | y^ 1 , (48) 

8=1 
fc 

^/(rSpNy^ly*- 1 ,^) (49) 



(J 
i=l 
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i=i 

k 

^^[/(^pSq^i.-yCOIy^.q' 



i=i 



3 J] [/(r l , p J ; y(i)| y^ 1 , q fc ) + J(q? +1 ; y(i)| y*" 1 , 
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(50) 
(51) 
(52) 



(d) 



i=i 

A- 



J>0V;y(Oly <-1 .q fc ) (53) 



2=1 

A' 



< p fc ; y(OI f~\ q fc ) = p fc ; y l I q fc ) (54) 

t=l 

where equality holds in (a) if and only if the Markov chain s l Hq'H (r 4 , p l ) holds for all i € {1, . . . , k} 
(as a straightforward extension of Lemma [Q. In our case, the latter Markov chain holds since we are 
assuming (q fc ,s fc ) X (r fc ,p fc ). In turn, (b) stems from the fact that, for all i £ {1, . . . ,k}, x t-d3 W is a 
function of y* _1 , q% r l , p\ To prove (c), we resort to ([P3l and write 

/(qm;y«ly ^ "^q^r^P ^ ) = /(qtl;y^r^p ^ |q ^ )-/(qtl;y ^ "^r^P ^ |q ^ ) (55) 

From the definitions of the blocks (in CED), it can be seen that, given q\ the triad of random sequences 
(y\r l ,p*) is a deterministic function of (at most) (s\r l ,p*). Recalling that (q k ,s k ) X (r,p fc ) and that 
q k +1 f> q 1 f> s' (see d24l)). it readily follows that q k +1 Hq' H (r l ,p*,s*), and thus each of the mutual 
informations on the right-hand-side of (|55l l is zero. To verify the validity of (d), we use ( [TBI and obtain 

J(q£n;y(0l y i_1 >q l ) = ^ + i;y J I q J ) - ^(q? + i;y i_1 1 q l ), (56) 

where (d) now follows since < I(q k , 1 ;y l ~ 1 \ q 1 ) < I(q k +1 ;y l | q*) < I(q k +l ; y i , r*, p* | q 4 ), where the 
last term in this chain of inequalities was shown to be zero in the proof of (d). Equality holds in (e) if 
and only if (r fc ,p fc ) <->■ (r l , p l , q l , y*" 1 ) y(«)> a Markov chain which is satisfied in our case from the 
fact that (q, s) X (r, p) and from Lemma Q] 

Finally, since (r fc ,p fc ) X (q fc ,s fc ), we have that the chain of equalities from (l43l to (l44l ) holds, from 
which we conclude that 

I(r k , p k ; f | q fc ) = I(r k , p* ; u k ) + I(r fc , p fc ; y k \ u k ) + I(q k ; r k \ u k , y k ). (57) 

Inserting this result into (|54l i and invoking Theorem [1] we arrive at equality (t) in d23"l) . 

To prove the first equality the d23l . it suffices to notice that I(x k — > y k \ q k ) corresponds to the sum 
on the right-hand-side of (|53l , from where we proceed as with the first part. This completes the proof 
of the theorem. ■ 
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V. Potential Applications 

Information inequalities and, in particular, the data-processing inequality, have played a fundamental 
role in Information Theory and its applications l|20l - ||27l . It is perhaps the lack of a similar body of results 
associated with the directed information (and with non-asymptotic, causal information transmission) which 
has limited the extension of many important information-theoretic ideas and insights to situations involving 
feedback or causality constraints (5), (28]. Two such areas, already mentioned in this paper, are the 
understanding of the fundamental limitations arising in networked control systems over noiseless digital 
channels, and causal rate distortion problems. In those contexts, causality is of paramount relevance an 
thus the directed information appears, naturally, as the appropriate measure of information flow (see, 
for example, @, Q, ifTTI . |29l , ll30l and Q). We believe that our results might help gaining insights 
into the fundamental trade-offs underpinning those problems, and might also allow for the solution of 
open problems such as, for instance, characterizing the minimal average data-rate that guarantees a given 
performance level iTTOl (an improved version of the latter paper, which extensively uses the results derived 
here, is currently under preparation by the authors). On a different vein, directed mutual information plays 
a role akin to that of (standard) mutual information when characterizing channel feedback capacity (see, 
e-g-, El, H and the references therein). Our results may also play a role in expanding the understanding 
of communication problems over channels used with feedback, particularly when including in the analysis 
additional exogenous signals such as a random channel state, interference and, in general, any form of 
side information. Thus, we hope that the inequalities and identities presented in Section HIT] may help 
in extending results such as dirty-paper coding ||3T| , watermarking 11321 , distributed source coding ||25l , 
11261 . ll33l . IT341 . multi-terminal coding ll35l . ll36l . and data encryption (371, to scenarios involving causal 
feedback. 

VI. Conclusions 

In this paper, we have derived fundamental relations between mutual and directed informations in 
general discrete-time systems with feedback. The first of these is an inequality between the directed 
information between to signals inside the feedback loop and the mutual information involving a subset 
of all the exogenous incoming signals. The latter result can be interpreted as a law of conservation of 
information flows for closed-loop systems. Crucial to establishing these bounds was the repeated use 
of chain rules for conditional mutual information as well as the development of new Markov chains. 
The proof techniques do not rely upon properties of entropies or distributions, and the results hold in 
very general cases including non-linear, time-varying and stochastic systems with arbitrarily distributed 
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signals. Indeed, the only restriction is that all blocks within the system must be causal mappings, and that 
their combined delay must be at least one sample. A new generalized data processing inequality was also 
proved, which is valid for nested directed informations within the loop. A key insight to be gained from 
this inequality was that the further apart the signals are in the loop, the lower is the directed information 
between them. This closely resembles the behavior of mutual information in open loop systems, where it 
is well known that any independent processing of the signals can only reduce their mutual information. 

VII. Appendix 

Lemma 1: In the system shown in Fig. [5J the exogenous signals r, q are mutually independent and 
S\,S2 are deterministic (possibly time-varying) causal maps characterized by y l = <Si(r*,u l ), u* = 
^(q*, y*), Vi 6 {1, . . . , k}, for some k C N. For this system, the following Markov chain holds 

r ~{Si\. Z ' [S^-q 

u 

Figure 5. Two arbitrary causal systems «Si,<S2 interconnected in a feedback loop. The exogenous signals r,q are mutually 
independent. 

r fc <— > (u fc ,y fc ) <— ► q fc , Vfc G K. (58) 

Proof: Since y k = S\{v k ,u k ) and u k = S2(q k ,y k ) are deterministic functions, it follows that 
for every possible pair of sequences y k ,u k , the sets p y k^ u k = {r k : y k = S\(r k ,u k )} and <j> y k^ u k = 
{q k : u k = S2(q k ,y k )} are also deterministic. Thus, (u k ,y k ) = (u k ,y k ) <?=^ r k G p y * >u k an d 
(u fc , y k ) = (u k , y k ) <^=^ q k € 4> y k iU k. This means that for every pair of Borel sets (R, Q) of appropriate 
dimensions, 

PT{r k eR,q k eQ\y k = y k ,u k = u k } 

= Pr{r fc G R,q k G Q\r k G p y ^ uk , q k G fa,*/.} 

= Pr{r fc G R\ i k G p y k >u k , q k G 4> y k iU k] Pr{q fe G Q\ r k G Gv^* n R) , q k G 
^ p r {r fc G R\r k G Pr{q fe G Q\q k G 

= Pr{r fc G R\y k = y k ,u k = u k }Pr{q k G Q\y k = y k ,u k = u k }, 
where (a) follows from the fact that r k X q k . This completes the proof. ■ 
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